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(57) Abstract 

A method (200) and apparatus for determining certain ambient 
conditions in a scene by analyzing a sequence of images (202) that 
represent the scene. The apparatus uses only image information to 
determine scene illumination (200), or the presence of shadows (206), 
fog. smoke, or haze by comparing (208) properties of the detected 
objects, averaged over a finite video sequence, against properties of 
the reference image of the scene as that scene would appear without 
any objects present. Such a reference image is constructed in a 
manner similar to time-averaging successive camera images. 
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METHOD AND APPARATUS FOR DETERMINING AMBIENT 
CONDITIONS FROM AN IMAGE SEQUENCE 

This patent application claims benefit of U.S. provisional patent 
5 application serial number 60/006100 filed October 31, 1995. 

The invention relates to image processing techniques and, more 
particularly, to a method and apparatus for detecting objects within a 
sequence of images. 

10 BACKGROUND OF THE DISCLOSURE 

Many computer vision systems for automatic surveillance and 
monitoring seek to detect and segment transitory objects that appear 
temporarily in the system's field of view. Examples include traffic 
monitoring applications that count vehicles and automatic surveillance 

15 systems for security. These systems often require different object detection 
and segmentation methods depending on the ambient conditions. An 
example of such a system is disclosed in U.S. Patent Application Serial 
No. 08/372,924 filed January 17, 1995, the disclosure of which is 
incorporated herein by reference. 

20 The three primary ambient conditions that can create a need for 

different detection and segmentation methods are: scene illumination, for 
example whether it is day or night, the presence of shadows, and scene is 
obscured due to fog, smoke or haze. For example, in a traffic monitoring 
system, the detection method at night may be specialized for detecting 

25 headlights and therefore may not be applicable during the daytime. Also, 
on bright days, objects may cast shadows that interfere with accurate 
object segmentations, and therefore require that an additional shadow 
removal method be used. Finally, on very foggy days, reliable object 
detection may be impossible, and therefore the monitoring system operates 

30 in a fail-safe mode. 

If an automatic monitoring system is to operate autonomously over 
an extended period of time, it should preferably include a method for 
determining the ambient conditions within a scene so that the system can 
use the appropriate detection method in response to those conditions. 
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Previous work on determining scene illumination, shadow 
presence, or fog presence has been based on image analysis. Some 
deployed systems predict scene illumination and shadow presence using 
an internal clock, knowledge of latitude and longitude, and a pre- 
5 computed calendar of sun positions. This, however, is not robust to 
overcast days or foggy weather. 

Additionally, approaches based on raw image intensity thresholds are 
very unlikely to be robust. 

Therefore, there is a need for a method and apparatus for 
10 determining the ambient conditions in a scene including scene 
illumination and the presence of shadows, fog, smoke or haze. 

SUMMARY OF THE INVENTION 
The invention is a method and apparatus for determining certain 

15 ambient conditions in a scene by analyzing a sequence of images that 
represent the scene. The invention uses only image information to 
determine scene illumination, or the presence of shadows, fog, smoke, or 
haze by comparing properties of detected objects, averaged over a finite 
video sequence, against properties of the reference image of the scene as 

20 that scene would appear without any objects present. Such a reference 
image is constructed in a manner similar to time-averaging successive 
camera images. 

BRIEF DESCRIPTION OF THE DRAWINGS 
25 The teachings of the present invention can be readily understood by 

considering the following detailed description in coiyunction with the 
accompanying drawings, in which: 

Fig. 1 is a block diagram of an image processor of the present 
invention; 

30 Fig. 2 depicts a flow diagram of a method for determining scene 

illumination; and 

Fig. 3 depicts a flow diagram of a method for determining ambient 
conditions from energy differences in the sequence of images as compared 
to a reference image. 
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To facilitate understanding, identical reference numerals have been 
used, where possible, to designate identical elements that are common to 
the figures. 



5 DETAIL ED DES CRIPTION 

Fig. 1 depicts an image processor 100 for generating a reference 
image and an object image, wherein a sequence of images generated by an 
image source 102 (e.g., video camera or video storage media) is converted 
to digital form in analog-to-digital converter (A/D) 104 and processed using 

10 pyramid processor 106, image stabilizer 108, reference image generator 
110, frame store 112, reference image processor 114 and subtracter 116. 
The reference image is derived and updated using reference image 
generator 110 and a frame store 112. The processor 100 produces a 
reference image and an object image. The reference image represents 

15 the background imagery of a scene captured by the field of view of a video 
camera or some other imaging sensor while the object image represents 
moving or temporarily stopped objects (non-background) imagery of the 
scene. The reference and object images are further processed by an object 
processor to determined the ambient environment of the scene. 

20 Each image of the image sequences are typically decomposed into a 

specified number of Gaussian pyramid levels by pyramid processor 106 for 
reducing pixel density and image resolution. Pyramid processor 106 is not 
essential, since the apparatus could be operated at the resolution of the 
640x480 pixel density of a video camera 102. However, because this 

25 resolution is higher than is needed downstream for the apparatus, the use 
of pyramid processor 106 increases the system's computational efficiency. 
Not all levels of the pyramid must be used in each computation. Further, 
not all levels of the pyramid need be stored between computations, as 
higher levels can always be computed from lower ones. However, for 

30 illustrative purposes it is assumed that all of the specified number of 
Gaussian pyramid levels are available for each of the downstream 
computations discussed below. A preferred pyramid processor 106 is a 
pyramid processing circuit described in U.S. Patent No. 5,359,674, the 
disclosure of which is incorporated herein by reference. 
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Image stabilizer 108 employs electronic image stabilization to 
compensate for camera motion and jitter. In general, camera motion 
causes pixels in the image to move (i.e., change) without there being 
actual object motion in the scene. The stabilizer 108 compensates for 
5 image translation from image-to-image that is due to camera rotation 
and/or sway. The stabilizer achieves continuous alignment to within one 
pixel between the reference image within frame store 112 and each new 
input image. The required shifting of the input image to achieve 
alignment is determined using a matched filter to locate two known 
10 landmark features in the scene as captured by the images and aligning all 
images with respect to these landmarks. 

The reference image generator 110 performs a recursive temporal 
filtering operation on each corresponding pixel of the successive stabilized 
image frames applied as an input thereto. Put mathematically, 

15 

rt.(x,y) = (r t .i(x,y)+gx[i t (x,y)-rt.i(x,y)] 

where r t represents the reference image after frame t, and i t represents 
the t'th frame of the input image frame sequence. The constant g 

20 determines the "responsiveness" of the construction process. Other 
algorithms for reference image generation may also be used. 

The "responsiveness" setting of g must be sufficiently slow to keep 
transitory objects that appear in the scene from being included in the 
reference image. As such, after initializing the reference image and 

25 updating the image with a few new input images, (i.e., an initialization 
phase), the stored reference image in frame store 112 comprises only the 
stationary background objects being viewed by the image source. Such a 
"responsiveness" setting of g is incapable of adjusting r t quickly enough to 
add illumination changes to the reference image. This problem is solved 

30 using the reference image processor 114. 

The processor 114 contains an illumination/AGC compensator that 
generates a reference image after the reference image pixel intensities 
have been passed through a linear function of the form k t x + c t , where the 
value of k t and c t , are generated by reference image processor 114 and 
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respectively represent the estimated gain and offset between the reference 
image r t and the current image i t . Processor 114 computes this gain and 
offset by plotting a cloud of points in a 2D space in which the x-axis 
represents gray-level intensity in the reference image, and the y-axis 
5 represents gray-level intensity in the current image, and fitting a line to 
this cloud. The cloud is the set of points (r t -i(x,y),it(*,,y)) for all image 
positions x f y. This approach will work using any method for computing 
the gain and offset representing illumination change. For example, the 
gain might be estimated by comparing the histograms of the current 

10 image and the reference image. 

The above approach allows fast illumination changes, which can 
usually be modeled as a gain and offset, to be added to the reference image 
while preventing transitory objects from being added. It does so by giving 
the reference image processor 114 the flexibility to decide whether the new 

15 reference image pixel values should be computed as a function of pixel 
values in the current image or whether they should be computed simply by 
applying a gain and offset to the current reference image. By applying a 
gain and offset to the current reference image the illumination change 
can be simulated without running the risk of allowing transitory objects to 

20 appear in the reference image. 

The result is that the amplitude of the stationary background 
manifesting pixels of the illumination-compensated current image 
appearing at the output of reference image processor 114 will always be 
substantially equal to the amplitude of the stationary background 

25 manifesting pixels of the reference image, which includes solely 
stationary background manifesting pixels, appearing at the output of 
frame store 112. Therefore, subtracter 116, is coupled to image stabilizer 
108 and processor 114 and produces the difference between the amplitudes 
of corresponding pixels applied as inputs thereto. These differences 

30 represent significantly-valued pixels that manifest solely moving object in 
each one of successive 2D image frames. The output of sub tractor 312 is 
forwarded to an object processor 118. 

The object processor comprises an object detection and tracking 
processor 120 that conventionally detects the objects in the scene and 
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tracks the objects as each object moves over time. To facilitate and 
improve object tracking, the present invention provides a scene 
illumination processor 122 that analyzes the object image to determine the 
illumination environment (e.g., day, night, fog and the like) of the scene. 
5 This illumination information is coupled to the object detection and 
tracking processor 120 such that the processor 120 can select appropriate 
detection and tracking routines that perform optimally in the present 
scene illumination environment. 

Fig. 2 depicts a flow diagram of a routine 200 for determining scene 

10 illumination. At step 202, a set of non-background pixels in an image I, 
(object image) each is classified as "bright" or "dark", depending on 
whether the total image intensity in some limited region centered on it is 
greater than or less than the total image intensity in the same region in 
the reference image R. This assumes the reference image R is aligned 

15 pixel-for-pixel with the image I. 

Over some finite number of images from a sequence of video images 
representing the sampling period on which the day/night determination is 
to be based, the total number of dark and bright n bright , non- 
background pixels detected is determined relative to a reference intensity. 

20 The intensity is the brightness of the pixel itself. The number of pixels 
classified as bright is counted and, at step 206, the number of pixels 
classified as dark is counted. The sum n dark +n bright yields the total 
number of non-background pixels that were detected. 

Using comparator step 208, the scene can be classified as well-lit or 

25 poorly-lit, or, in outdoor scenes, daytime or nighttime, using the fraction 

fdark = n dark ^ n dark + n brighO- 

Assuming that the objects being detected have surface colors obtained 
from a uniform distribution, and that the mean background intensity is 
roughly in the middle of this distribution, then f dark should be 
30 approximately .5. However, if the scene is poorly-lit, the background 
image will be dark, and it will be difficult to detect any pixel with a dark 
surface color. Under this condition f dark becomes small. Therefore, a 
simple test can be used to determine whether the scene is well- or poorly- 
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lit: if ffork > <*, then the scene is well-lit. Otherwise it is poorly-lit. In 
practice, using a = 0.2 has proven to make accurate determinations for 
outdoor scenes. 

A different routine is used to detect fog/haze and shadow presence 

5 in the scenes. FIG. 3 depicts a flow diagram of a routine 300 for ambient 

condition detection (e.g., shadow/fog/haze detection). 

Specifically, the presence of shadows is also determined using the 

concepts of bright and dark non-background pixels. At step 302, the pixels 

(or regions) of the object image are separated into bright and dark regions. 

10 At steps 304 and 306 for each non-background pixel having the 

coordinates (x, y) in an image G, the routine defines the "energy" at (x, y) 

as a function of the intensity differences between pixels near (x, y) . In 

practice, the following energy measure has been used: 

£|GU',/)-C(x'-lj 
energy{G,x,y) = {x ' y * mxy) 

15 where E(x,y) is the intensity of the pixel at position (x,y) in the image G 
and W(x,y) represents a "windowing" function producing a set of pixels 
which are taken to be the "neighbors" of pixel (x, y) , and where W(%,y) is 
the cardinality of this set. Other energy functions can also be used. 
Given an image I (object image) and the reference R (reference 

20 image), the energy difference at a pixel (x, y) is defined as 

ediff(x.y) = \energy(I y x,y)- energy{R,x,y% 
Over the same sample time period used to determine day/night, the mean 
energy difference of all the bright non-background pixels, and the mean or 
average energy difference of all the dark non-background pixels are 

25 computed and denoted as E bright and E^, respectively. For 

computational efficiency, one can classify a sparse sampling of the pixels 
rather than attempting to classify each pixel. This sparse sampling can 
be specified by the user, using an image mask, so that the sampling 
occurs only in those places where transitory objects are likely to appear. 

30 As an example, in a traffic monitoring application, a sampling mask that 
restricted sampling only to image locations that fell on the roadway can be 
used. 
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The size of the set produced by the neighborhood function W that is 
used to compute the energy may vary with pixel location (x,y) or may be a 
fixed size independent of location. W may be derived from the same mask 
that is used to select the pixels that are to be classified as background or 
5 non-background. The same principle may be applied within the routine 
200 for determining scene illumination. 

At step 308, the routine processes the values E dark and E bright 
to detect the presence of shadows in the scene. Specifically, empirical 
study has shown that under typical shadow conditions on smooth 
10 background surfaces such as roadways, E dark /E bright > 1.2. Under diffuse 
illumination, for example on overcast day, in outdoor scenes, or when 
shadows are quite short, E^/E^ is significantly smaller. This is 
because the neighborhood window function W used in the energy 
measure typically spans both the object and the background, and therefore 
15 the energy function measures the contrast between the object and the 
background. In brightly-lit scenes where shadows are likely, the 
background is likely to appear quite bright, and therefore bright objects 
will contrast less sharply against the background than will dark objects. 
At step 310, the presence of fog, haze, or smoke is detected by 
20 examining the magnitude of E brighi and E^.. In well-lit conditions, E^ 
should be greater than some minimum; otherwise the scene has poor 
contrast and fog is likely. In poorly-lit conditions, E hright should be greater 
than some minimum; otherwise, fog is likely. 

It is to be understood that the apparatus and method of operation 
25 taught herein are illustrative of the invention. Modifications may readily 
be devised by those skilled in the art without departing from the spirit or 
scope of the invention. The invention can be used in any system for 
automatic surveillance and monitoring or wherever a need exists to 
determine ambient conditions from a sequence of images. 
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What is claimed is: 

1. A method for detecting ambient conditions of a scene represented by a 
5 sequence of images comprising the steps of: 

generating a reference image containing background information 
regarding the scene; 

comparing, pixel-by-pixel, each of said images in said sequence of 
images to said reference image; 
10 classifying, in response to each comparison, the pixels of said 

images as either background or non-background; 

comparing a brightness measure of each non-background pixel, 
computed over a neighborhood of pixels local to that pixel, against a 
threshold; and 

15 processing each non-background pixel to determine the ambient 

conditions of the scene. 

2. The method of claim 1 wherein the processing step further comprises: 

classifying, in response to said comparison, each of said 
20 non-background pixels as either bright or dark; 

sum the number of bright and dark pixels over a number of images; 

and 

process the sums of bright and dark pixels to determine if the scene 
is well-lit or poorly-lit. 

25 

3. The method of claim 1 wherein the threshold is the brightness measure 
obtained over a corresponding neighborhood of pixels in the reference 
image. 

30 4. The method of claim 1 wherein the processing step further comprises 
the steps of: 

classifying, in response to said comparison, each of said non- 
background pixels as either bright or dark; 
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comparing an energy value of each pixel in said image to an energy 
value in each pixel of said reference image; 

determining a mean energy difference of all bright non-background 
pixels and a mean energy difference for all dark non-background pixels; 
5 process the mean energy difference of the dark and bright pixels to 

determine if the scene contains shadows. 

5. The method of claim 1 wherein the processing step further comprises 
the steps of: 

10 classifying, in response to said comparison, each of said 

non-background pixels as either bright or dark; 

comparing an energy value of each pixel in said image to an energy 
value in each pixel of said reference image; 

determining a mean energy difference of all bright non-background 
15 pixels and a mean energy difference for all dark non-background pixels; 

determining an absolute magnitude of the mean energy difference 
of all bright non-background pixels and an absolute magnitude of the 
mean energy difference for all dark non-background pixels; and 

process the absolute magnitude of the mean energy difference of the 
20 dark and bright pixels to determine if the scene contains fog or haze. 

6. Apparatus (100) for detecting ambient conditions of a scene represented 
by a sequence of images comprising: 

a reference image generator (110) for generating a reference image 
25 containing background information regarding the scene; 

means (116), coupled to said reference image generator, for 
comparing, pixel-by-pixel, each of said images in said sequence of images 
to said reference image; 

an illumination processor (122), coupled to said comparing means, 
30 for classifying, in response to each comparison, the pixels of said images 
as either background or non-background, for comparing a brightness 
measure of each non-background pixel, computed over a neighborhood of 
pixels local to that pixel, against a threshold, and for processing each 
non-background pixel to determine the ambient conditions of the scene. 
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7. The apparatus of claim 6 wherein the illumination processor further 
comprises: 

means (202) for classifying, in response to said comparison, each of 
5 said non-background pixels as either bright or dark; 

means (204, 206) for summing the number of bright and dark pixels 
over a number of images; and 

means (208) for processing the sums of bright and dark pixels to 
determine if the scene is well-lit or poorly-lit. 

10 

8. The apparatus of claim 6 wherein the threshold is the brightness 
measure obtained over a corresponding neighborhood of pixels in the 
reference image. 

15 9. The method of claim 6 wherein the illumination processor further 
comprises: 

means (302) for comparing an energy value of each pixel in said 
image to an energy value in each pixel of said reference image; 

means (304, 306) for determining a mean energy difference of all 
20 bright non-background pixels and a mean energy difference for all dark 
non-background pixels; and 

means (308) for processing the mean energy difference of the dark 
and bright pixels to determine if the scene contains shadows. 

25 10. The apparatus of claim 6 wherein the illumination processor further 
comprises: 

means (302) for classifying, in response to said comparison, each of 
said non-background pixels as either bright or dark; 

means (304, 306) for determining a mean energy difference of all 
30 bright non-background pixels and a mean energy difference for all dark 
non-background pixels; 

means (310) for determining an absolute magnitude of the mean 
energy difference of all bright non-background pixels and an absolute 
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magnitude of the mean energy difference for all dark non-background 
pixels; and 

means (310) for processing the absolute magnitude of the mean 
energy difference of the dark and bright pixels to determine if the scene 
5 contains fog or haze. 
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